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R. Duncan Luce once mentioned in a conversation that he did not consider Kolmogorov’s prob¬ 
ability theory well-constructed because it treats stochastic independence as a “numerical accident,” 
while it should be treated as a fundamental relation, more basic than the assignment of numerical 
probabilities. I argue here that stochastic independence is indeed a “numerical accident,” a special 
form of stochastic dependence between random variables (most broadly defined). The idea that it is 
fundamental may owe its attractiveness to the confusion of stochastic independence with stochastic 
unrelatedness, the situation when two or more random variables have no joint distribution, “have 
nothing to do with each other.” Kolmogorov’s probability theory cannot be consistently constructed 
without allowing for stochastic unrelatedness, in fact making it a default situation: any two random 
variables recorded under mutually incompatible conditions are stochastically unrelated. However, 
stochastically unrelated random variables can always be probabilistically coupled, i.e., imposed a 
joint distribution upon, and this generally can be done in an infinity of ways, independent coupling 
being merely one of them. The notions of stochastic unrelatedness and all possible couplings play 
a central role in the foundation of probability theory and, especially, in the theory of probabilistic 
contextuality. 
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I. INTRODUCTION 

Almost 15 years ago R. Duncan Luce mentioned in a 
conversation that the Kolmogorovian probability theory 
(KPT) was unsatisfactory because it treated stochastic in¬ 
dependence as a “numerical accident” rather than a funda¬ 
mental relation. If I roll a die today in Irvine, California, 
Duncan said, and on another day you roll a die in Lafayette, 
Indiana, the fact that the two outcomes are independent 
cannot be established by checking the multiplication rule. 
On the contrary, the applicability of the multiplication rule 
in this case is justified by determining that the two dice are 
stochastically independent, “have nothing to do with each 
other.” 

This simple example (some may think too simple to be 
of great interest) leads us to the very foundations of prob¬ 
ability theory. Let us try to understand it clearly by com¬ 
paring it to another example. It is about a situation when 
I repeatedly roll a single die, having defined two random 
variables: 

^ _ r 1 if the outcome is even 
1 0 otherwise ’ 

^ _ f 1 if the outcome exceeds 3 
1 0 otherwise 

These two random variables co-occur in the most obvious 
empirical meaning: the values of A and B are always ob¬ 
served together, at every roll of the die. Another way of 
looking at it, the two random variables co-occur because 
they are functions of one and the same “background” ran¬ 
dom variable Z, the outcome of rolling the die. As a re¬ 
sult, I can estimate from the observations the probabilities 
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Pr [A = 1 and R = 1], Pr [A = 1], and Pr [B = 1] (I will use 
Pr as a symbol for probability throughout this paper): if 
the joint probability turns out to be the product of the 
two marginal ones (statistical issues aside), the two events 
are determined to be independent. I cannot simply make 
this determination a priori, as it depends on what die I am 
rolling: if it is a fair die, A and B are not independent, but 
if the distribution of the outcomes is 

value : 1 2 3 4 5 6 
pr.mass : 0 i i i i 0 ’ 

then A and B are independent. 

The difference between this example and that of Duncan 
Luce’s is not in the number of the dice being rolled: my 
example would not change too much if I roll two dice to¬ 
gether, having marked them “Left” and “Right,” and define 
the random variables as 

^ _ f 1 if the Left outcome is even 
1 0 otherwise ’ 

^ _ f 1 if the Right outcome exceeds 3 
1 0 otherwise 

The realizations of A and B again come together, this time 
the empirical meaning of the “togetherness” being “in the 
same trial,” or “simultaneously.” Again, one can also say 
that the two random variables co-occur because they are 
functions of one and the same “background” random vari¬ 
able Z, only this time it is the pair of values rather than 
a single one. And again, I can estimate from the obser¬ 
vations the probabilities Pr [A = 1 and R = 1], Pr [A = 1], 
and Pr \B = 1] and check their adherence to the multiplica¬ 
tion rule. Whether the two random variables are stochasti¬ 
cally independent is determined by the outcome of this test: 
the dice may very well be rigged not to be independent. 

In Duncan Luce’s example the situation is very different: 
the outcomes of rolling the two dice in two different places 
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at two different times have no empirically defined pairing. 
If I define my random variables as 

\ 1 if on Tuesday in Irvine the outcome is even 
1 0 otherwise ’ 

^ _ r 1 if on Friday in Lafayette the outcome exceeds 3 
1 0 otherwise 

then I can estimate empirically the probabilities Pr [A = 1], 
and Pr = 1] and find out, e.g., that they are (statisti¬ 
cal issues aside) 0.7 and 0.5, respectively. But I cannot 
estimate empirically Pr [A = 1 and B = 1]: the two ran¬ 
dom variables are not recorded in pairs. The experiment 
involves no empirical procedure by which one could find 
which value of B should be paired with which value of 
A. The two random variables therefore do not have an 
observable (estimable from frequencies) joint distribution, 
they cannot be presented as functions of one and the same 
“background” random variable. What one can do, however, 
is to declare the two random variables stochastically in¬ 
dependent, based on one’s understanding that they “have 
nothing to do with each other.” If one does so, the validity 
of Pr [A = 1 and B = 1] being equal to the product of two 
individual probabilities is true by construction, requiring no 
empirical testing and allowing for no empirical falsification. 

This was Duncan Luce’s point: while the KPT defines 
stochastic independence through the multiplication rule, 
at least in some cases the determination of independence 
precedes and justifies the applicability of the multiplica¬ 
tion rule. In Duncan Luce’s opinion, this warranted treat¬ 
ing stochastic independence as a fundamental, “qualitative” 
relation preceding assignment of numerical probabilities. 
This opinion is in accordance with the general precepts of 
the representational theory of measurement. Thus, the au¬ 
thors of the first volume of Foundations of Measurement 
(Krantz et ah, 1971) sympathetically refer to Zoltan Do- 
motor 1969 dissertation in which he axiomatized probabil¬ 
ity theory treating stochastic independence as a primitive 
relation. As far as I know, however, it has not translated 
into a viable alternative to the KPT. 

I accept Duncan Luce’s example as posing a genuine 
foundational problem, but I disagree that this problem is 
about defining independence by means other than the mul¬ 
tiplication rule. The position I advocate below in this paper 
is as follows. 

1. Random variables that “have nothing to do with 
each other” are defined on different domains (sample 
spaces). Rather than being independent (which is a 
form of a joint distribution), they are stochastically 
unrelated, i.e., they possess no joint distribution. 

2. It is not that we do not know the “true” distribution, 
or that in “truth” they are independent but we do not 
know how to justify this. A joint distribution simply 
is not defined (until imposed by us in one of multiple 
ways, discussed below). 

3. The KPT is consistent with the idea of multiple sam¬ 
ple spaces and in fact requires it for internal consis¬ 


tency: the idea of a single sample space for all random 
variables imaginable is mathematically untenable. 

4. Any given set of pairwise stochastically unrelated ran¬ 
dom variables can always be coupled, i.e., imposed a 
joint distribution on. This is equivalent to inventing a 
pairing scheme for their realizations, and this can be 
done in multiple ways, coupling them as independent 
random variables being just one of them. 

II. ON RANDOM VARIABLES, 
UNRELATEDNESS, AND INDEPENDENCE 

II. 1. Informal introduction 

Stochastic unrelatedness is easy to distinguish from 
stochastic independence: the latter assumes the existence 
of a joint distribution, which means that an empirical pro¬ 
cedure exists by which each realization of one random vari¬ 
ables can be paired (coupled) with that of another. The 
most familiar forms of empirical coupling are co-occurrence 
in the same trial and co-relation to the same person. In the 
table below, 

c: 1 2 3 4 5 ... 

A : a;i a;2 2:3 X4 a;5 ... , (1) 

y ■ yi 2/2 2/3 2/4 2/5 ■ ■ • 

the indexing entity c can be the number of a trial (as in 

repeatedly rolling two marked dice together) or an ID of 
a person (as in relating heights and weights, or weights 
before and after dieting). The random variables X and 
Y here have a joint distribution: one can, e.g., estimate 
the probability with which X falls within an event Ex and 
(“simultaneously”) Y falls within an event Ey', and if 

Pr [A e Kx & K G Ey] = Pr [A G Ex] Pr [Y G Ey] , (2) 

for any two such events Ex , Ey, then A and Y are consid¬ 
ered independent. 

Suppose, however, that the information about c in ([T|) 
does not exist, and all one has is some set of values for A 
and some set of values for Y. Clearly, now the “togeth¬ 
erness” of A G Ex and Y G Ey is undefined. Although 
Pr [A G Ex] and Pr [A G Ex] have the same meaning as 
before, Pr [A G Ex & A G Ey] is undefined, and ([2]) cannot 
be tested. This is what stochastic unrelatedness is: lack of 
a joint distribution. A pair of stochastically unrelated ran¬ 
dom variables are neither independent nor interdependent, 
these terms do not apply. 

Think, e.g., of a list of weights in some group of peo¬ 
ple before dieting (A) and a list of weights in some other 
group of people after dieting (A): which value of A 
should be paired with which value of A to try to estimate 
Pr [A G Ex & A G Ey]l Any pairing one can impose here 
will be as good as any other pairing, and none of them is 
determined by the empirical procedures involved (weigh¬ 
ing people in the two groups). This simple example has 
counterparts in all experiments where random variables are 
recorded under two or more mutually exclusive conditions. 
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A question arises: couldn’t one nevertheless treat 
stochastically unrelated X and Y as if they were inde¬ 
pendent? The answer is alhrmative, but so is the answer 
to the question whether X and Y can be treated as if they 
were not independent. Treating the outcomes of rolling dice 
in Irvine on Tuesday (X) and in Lafayette on Friday (T) 
as if they were jointly distributed means constructing an¬ 
other pair of random variables, Y ^, this time a jointly 

distributed one, such that X and Y taken separately are 
distributed as X and Y, respectively. Such constructions 
form the subject of a special branch of probability theory 
called the theory of coupling(s) (Thorisson, 2000). 

Let, e.g., both dice be fair. One can always construct 
^X,y^ by assigning probability mass to each of the 36 

pairs. This pair ^X, is the independent coupling of X 
and Y. Its choice corresponds to pairing every realization of 
X with every realization of Y (or with uniformly randomly 
chosen realization of Y). There is, however, no reason to 
single out the independent coupling. One can also make 
X and Y perfectly correlated or perfectly anticorrelated by 
assigning the probability masses as, respectively. 


pr.mass 


X = xkY 


y 


0 if x^y 
g */ x = y 


or 


Y are stochastically unrelated. When they are, one can 
impose on them a joint distribution by creating a coupling 

^X,X^ for X and Y “on paper.” The individual distribu¬ 
tions of stochastically unrelated X and Y do impose some 
constraints on possible joint distributions of ^X,y^, but, 

except in degenerate cases, do not determine it uniquely. 
The independent coupling is not the only possible coupling 
of stochastically unrelated random variables. 


II.2. Formalizing the “naive” account of random 
variables 

Random variables are defined by their distributions (say 
probability masses associated with every possible roll of a 
die) and, in order to distinguish different variables having 
the same distribution, by their unique names (e.g., “the 
outcome of the die rolled in Irvine on Tuesday”). On a more 
general level, the distribution of a random variable called 
X is a probability space {Sx,^x, P-x), with the standard 
meaning of the terms: Sx is the set of possible values for X, 
T,x is a s^ma-algebra of subsets of Sx, and p a probability 
measured For each element Ex of Ex (an event) we define 
the probability of X “falling in Ex" or “satisfying Ex" as 

Pt[X €Ex]=Px{Ex). (3) 


pr.mass 


X = X XY = y 


0 if xEy^l 
i if xYy = l' 


where x,y € {!,..., 6}. 

These couplings correspond to pairing each realization 
of X with only one specific realization of Y. If the dice 
producing X and Y have outcomes with different distribu¬ 
tions, a perfectly correlated or anticorrelated coupling will 
not be possible, while the independent coupling will, as it is 
universally applicable. But the independent coupling still 
will not be the only possible one (unless one of the dice 
is rigged to roll a single outcome, in which case the only 
possible coupling can be viewed as independent, perfectly 
correlated, or perfectly anticorrelated). 

One may be tempted to think that the “true” pair¬ 
ing should involve ordering the observations of X and Y 
chronologically and pairing the outcomes with the same 
trial number. A brief reflection should show, however, 
that this is an arbitrary choice: what theoretical princi¬ 
ples would compel one to pair the first realization of X 
with the first realization of Y (occurring, in Duncan Luce’s 
example, at another time in another place), rather than 
with the tenth one, the last one, or one having the same 
quantile rank? Recall also that the chronological sequences 
need not be defined to begin with: instead of rolling a sin¬ 
gle die repeatedly one could roll a large number of identical 
dice and count the events. 

Summarizing, a joint distribution for empirically ob¬ 
served X and Y exists only if there is an empirical proce¬ 
dure for coupling their realizations, such as relating them 
to one and the same value of c in O- Otherwise X and 


Given another random variable, called Y and distributed 
as (Sy,Ey,Py), we say it is jointly distributed with X 
if there is a random variable Z = (X, X) whose name 
is “ordered pair of X and X” and whose distribution is 
{Sx X Sy,Ex Ey, v), subject to 


1 / {Ex X Sy) = Px {Ex )) 
V {Sx X Ey) = py {Ey) , 


( 4 ) 


for any events Ex G Ex and Ey S Ey- The meaning of 
Ex 0 Ey is the smallest sigma-algebra on Sx x Sy that 
contains pairwise products of events in Ex and Ey- 

If such a X = (X, X) exists (is defined among the ran¬ 
dom variables one considers), then the joint distribution of 
X and X is unique. The existence of this random variable, 
however, is not established by a mathematical derivation 
from the properties of X and X, it is determined by the ex¬ 
istence of an empirical procedure in which the realizations 
of X and X are observed “together.” If X = (X, X) does 

not exist, one can always construct a coupling Z = ^X, X^ 

whose distribution is {Sx x Sxy'Ex ZiEy-,v)-, subject to 
The only difference (but a critical one) is that the 
name of this Z is not “ordered pair of X and F” but “or¬ 
dered pair of X [whose distribution is the same as that of 


^ I could have said “distribution is determined by (Sx^ Mx)/’ but 
it is simpler to say “distribution is {Sx ? ^x ? )/’ as we do not have 

an independent general definition of a distribution. 
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X] and Y [whose distribution is the same as that of y].” 
Such a Z can be freely introduced and does not change the 
X and Y being coupled; in fact, is stochastically 

unrelated to X and to Y. 

All of this can be easily generalized to an arbitrary set 
of random variables (see, e.g., Dzhafarov and Kujala, in 
press). 

II.3. Random variables and joint distributions in 
KPT 

The formal account just given is not that of the tradi¬ 
tional KPT. The latter begins with the notion of a sample 
space, (5, 1],/i)13 and defines a random variables X as 
a measurable mapping of this space into a measurable 
space i-e., a function X : S ^ S' such that 

X~^ {Ex) € S for any Ex S Sjvll The mapping induces 
on the codomain space (Sx, Sx) a probability measure px, 
by the rule 

Pr[AeAx]=/ix(Ax)=p(X-i(£;x)), (5) 

for any Ex € Sx- The resulting triple {Sx,Yjx, p,x) 
is called the distribution of X. If another measurable 
mapping Y is defined on the same sample space, map¬ 
ping it into a codomain space (S'Y,Ey) and resulting in 
the distribution (S'y, Sy,/iy), then their joint distribution 
{Sx X Sy, Sx ® Sy, i^) is derived from the relation 

Pr [A G Ax & y e Ay] = p (A“^ (Ax) n (Ay)) , 

( 6 ) 

for any Ex € Ex and Ey € Ey. Note that unlike in the 
“naive” approach above, the joint distribution of X and Y 
here is mathematically derived from their individual defi¬ 
nitions as measurable functions on the same sample space. 

Clearly, any two random variables defined on the same 
sample space are jointly distributed. This may create a 
temptation to assume the existence of a common sample 
space and a joint distribution (even if unknown to us) for 
any two random variables. In turn, this would mean the 
existence of a common sample space for all possible ran¬ 
dom variables, so that random variables in any set under 
consideration possess a joint distribution, and this distribu¬ 
tion is unique. Kolmogorov’s (1933/1956) book may seem 


^ Note the terminological variance: “sample space” is more often than 
not used in the literature to designate just the set S rather than the 
entire domain probability space (5, S, ^). I find it more in line with 
the general meaning of the terms “set” and “space” in mathematics 
to refer to 5 as a sample set (or set of possible outcomes). This set 
is promoted into a space by endowing it with a structure, which in 
this case is provided by the sigma algebra E and the measure y. 

^ Kolmogorov (1933/1956) only considered the case when Sx is a 
subset of reals and Ex is the Borel sigma-algebra restricted to this 
subset. In this paper I use the term “random variable” in the broad 
sense, with no restrictions imposed on (5'x5Sx)- Some authors 
prefer to use the term “random element” or “random entity” to 
designate random variables in the broad sense. 


to reinforce this view, as it does not explicitly speak of 
multiple sample spaces. I disagree with this interpretation, 
even if not uncommon (see, e.g., the overview of interpreta¬ 
tions in Khrennikov, 2009c). Kolmogorov’s monograph ties 
the notion of a sample spac^l to “a complex of conditions 
which allows of any number of repetitions” (Kolmogorov, 
1933/1956, §2 of Chapter 1): this can be interpreted as a 
position very close to if not the same as the one argued for 
below. Whatever the correct interpretation, however, the 
notion of a single sample space for all random variables is 
untenable as it contradicts the common mathematical prac¬ 
tices in dealing with random variables. In Dzhafarov and 
Kujala (2014a) we presented the following two arguments 
demonstrating this. 

First of all, for any choice of a universal sample space 
{S, E, /i) all random variables X defined on it will have the 
cardinality of their set of possible values, defined as Sx = 
X {S), less than or equal to the cardinality of S. There 
is, however, no justification, empirical or mathematical, for 
limiting the cardinality of the set Sx of all possible values 
for a random variable xE 

Second, even if we confine our attention to very simple 
random variables with one and the same distribution, there 
is no justification, empirical or mathematical, for limiting 
this set in any way. One can always add a new random 
variable to any given set thereof. Thus, given any set Af 
of unit-normally distributed random variables, one can in¬ 
troduce a unit-normally distributed Yx" such that its cor¬ 
relation with any X G Af is zero. If there were a universal 
sample space (S', E,^), then there would be a definite set 
Af* of all possible unit-normally distributed random vari¬ 
ables. But this would mean that our Y/y. would have to 
belong to this set, which is impossible, as Yx- cannot have 
zero correlation with itself. 

To further appreciate the untenability of a universal sam¬ 
ple space, observe that the identity mapping from this space 
into itself is a random variable, R. The idea of a univer¬ 
sal sample space therefore is equivalent to the existence 
of a random variable R of which all imaginable random 
variables are functions. This does not add a new formal 
argument against the idea, but it seems especially demon¬ 
strative: what this mysterious “super-variable” R could be? 


II.4. A reinterpreted (or revised?) KPT 

All these considerations lead us to a different picture of 
the KPT, in which there are different, stochastically un¬ 
related random variables R, R', R", ... , corresponding to 
different, mutually exclusive conditions under which they 


^ Kolmogorov’s terminology is not the same as the modern terminol¬ 
ogy (or variant thereof) I use in this paper; in particular, he does 
not speak of a sample space but of a “basic set with an algebra of 
subsets.” 

® Kolmogorov (1933/1956) did not have to deal with this issue, as 
the random variables in this book are confined to real-valued ones. 
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are observed 0 and for each of these random variables one 
can define various functions of it, 

X = fiR), Y = g{R),... 

X' = /' {R') ,Y'=g' {R') (7) 
= Y" = g"{R"),... 

so that any two random variables that are functions of one 
and the same member of the set R, R', R”, ... possess a joint 
distribution, while any functions of two different members 
of this set do not. To preserve Kolmogorov’s definition of 
a random variable, the R, R', R" ,... can be thought of as 
identity functions on their separate sample spaces. 

This picture is a step in the right direction, but it is still 
flawed if we think of i?, R', R" ,... as some fixed set compris¬ 
ing all pairwise stochastically unrelated random variables. 
The reason for this is that if R, R', R", ... were a fixed set, 
with the corresponding sample spaces (which, since they are 
identity functions, are simultaneously their distributions) 

{Sr, Yu, flu), {Sr>,Y ii', fiR'), {Sr",YR n , fiR,,) ,... (8) 

then one could form a single random variables R* of which 
R, R', R" ,... (hence also all other random variables imagin¬ 
able) were functions. This “super-variable” R* would have 
the set and sigma-algebra that are products of, respectively, 
sets and sigma-algebras in ([5]), and it would have a proba¬ 
bility measure v from which fiR, fiR >, fiRn ,... are computed 
as marginals, e.g., 

fJ-R (Er) = V {Er X Sr! X Sr>i X ...), 

for any Er G Yr. We have seen already that the idea of 
such a “super-variable” is untenable. 

A logically consistent way out of this difficulty is to con¬ 
sider R, R' , R" ,... as a class with uncertain and/or flexible 
membership^ Indeed, it should be clear from the previous 
discussion that random variables can be freely introduced, 
so, e.g., there is no fixed set of random variables with any 
given distribution. Some random variables we observe have 
an empirically defined coupling scheme, and then they are 
jointly distributed. Other sets of random variables we ob¬ 
serve are observed under different conditions each and do 
not have an empirical coupling. Then they can be modeled 
as stochastically unrelated random variables. However, we 
then can create “copies” of these random variables and cou¬ 
ple them “on paper” in a multitude of ways. This seems 
to be a consistent view of random variables. KPT is by 
no means dismissed in this view, because any distribution 
{Sx, Yx, fix) for a random variable X is a probability space 
subject to Kolmogorov’s axioms: 

1. fix is a function Yx [0, c»); 


® The notation R, R', R”,... is informal and should not be inter¬ 
preted as indicating a countable set. 

^ A different approach is presented in Dzhafarov and Kujala (2015b), 
where we formally define, by means of a quasi-constructive proce¬ 
dure, the set of all random variables considered “existing” in a given 
study. 


2 . fix{Sx) = 1 ; 

3- ) = LtiMx (^E^x) for rmy sequence 

of pairwise disjoint E ^'^, E ^^,... in Yx ■ 

Moreover, insofar as one focuses on a given set of jointly 
distributed random variables, all of them can be presented 
as measurable functions on a single sample space (or func¬ 
tions of a single random variable). 


II.5. Radical contextualism 

Is there a unique way of determining which random vari¬ 
ables are and which are not stochastically interrelated? A 
general answer to this question is negative: the definition of 
a jointly distributed set of random variables involves an em¬ 
pirical procedure of coupling their realizations, “observing 
them together.” The meaning of such an empirical proce¬ 
dure may be different for different situation and different 
observers. From the mathematical point of view, however, 
the question is about a language that makes the fundamen¬ 
tal distinctions between stochastically related and stochas¬ 
tically unrelated random variables. Such a language is pro¬ 
posed in Dzhafarov and Kujala (2014b), Dzhafarov, Kujala, 
and Larsson (2015), Kujala and Dzhafarov (2015), and Ku¬ 
jala, Dzhafarov, and Larsson (2015). For an overview, see 
Dzhafarov and Kujala (2015a) and Dzhafarov, Kujala, and 
Cervantes (2016). 

It is postulated that every random variable’s identity is 
determined by two types of variables, referred to as ob¬ 
jects (also properties, entities, contents, etc.) and contexts 
(also conditions, environment, etc.). Intuitively, the ran¬ 
dom variables are treated as “measurements,” the objects 
answer the question “what is measured?” whereas the con¬ 
texts answer the question “how is it measured?” 

Let Q be a set of objects and C a set of contexts consid¬ 
ered in a given study. The mentioning of “a given study” is 
essential: in a different study one could choose a different 
set of objects to measure and a different set of contexts in 
which to measure them. The measurement R of an object 
q G Q in & context c G C is denoted by R/ 

The meaning of a context is that it provides an empiri¬ 
cal coupling for the measurements within this context: the 
random variables R^ with different q measured within the 
same context c are “measured together,” i.e., they possess 
a joint distribution. Denoting by Qc the subset of objects 
in Q that are measured in context c, 

( 9 ) 

is a random variable (which implies that it has a distri¬ 
bution, and this distribution is a joint distribution of its 
components). On the other hand, any two random vari¬ 
ables Rg and R/ with c / c' are stochastically unrelated, 
whether q and q' are distinct objects or not. It follows that 
any random variables and it!“ defined as in m with 
c/ E are stochastically unrelated. 
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The idea of such contextual notation and (mutatis mu¬ 
tandis) understanding of stochastic unrelatedness have pre¬ 
cursors and analogues in the quantum-physical literature: 
see Khrennikov (2005, 2008, 2009a-c), Simon, Brukner, and 
Zeilinger (2001), Larsson (2002), Svozil (2012), and Winter 
(2014). Khrennikov (2009c) points out that the contextual 
understanding of random variables is intrinsic features of 
von Mises’s “ensemble approach” to probabilities: the iden¬ 
tity of an “ensemble” of observations corresponds to context 
in which these observations are made@ 

Three aspects of our theory, however, set it aside from 
this literature: 

1. Contextual labeling is universal, and no two random 
variables recorded in different contexts have a joint 
distribution. 

2. Pairwise stochastically unrelated random variables 

(each of which is a set of jointly distributed 
random variables) can be coupled at will, with no 
coupling being privileged. 

3. The random variables can be characterized 

by whether it is possible or impossible to couple them 
in a particular way (e.g., by a maximally connected 
coupling, as discussed in Section IlLTl) . 

Below I will give an example of how these principles work in 
solving the problem of selective influences in psychology, as 
well as its generalized version, the problem of coiitextuality, 
primarily studied in quantum mechanics. First, however, 
I have to address some obvious objections to the radical 
contextualism. 


II.6. Possible objections 

The first objection is that it is impossible to take into 
account all conditions in the world, and without knowing 
them one would not know if one deals with stochastically 
related or unrelated random variables. The response to 
this objection lies in the qualification “in a given study” 
I made when I introduced object sets Q and context sets 
C. The identification of random variables by what they 
measure and by how they measure it depends on what other 
variables in the world one records and relates to realizations 
of the random variables in question. 

To give an example, let there be a very large group of 
husband-and-wife couples; to each of the husbands Alice 
poses one of two different Yes/No questions, oi or 02 ; to 
each of the wives Bob poses one of two different Yes/No 
questions, bi or 62 (that may be the same as or different 
from fli, 02 ). Alice decides (this is not a matter of truth or 
falsity but one of convention) to consider the responses to 


oi and 02 in the group of husbands as realizations of ran¬ 
dom variables i?ai and Ra 2 ^ respectively; and Bob defines 
Rbi and Rb 2 for the group of wives analogously. This label¬ 
ing indicates that Alice treats oi and 02 as objects being 
measured (by responses to these questions), and so does 
Bob for bi and &20 

Let us ask now: what are the contexts in which Alice 
records her Ra^ and By the rules of the survey each 

person answers a single question, so asking ai excludes ask¬ 
ing 02 and vice versa. In other words, the conditions under 
which one records answers to oi and 02 are incompatible. 
Formally, this means that oi is measured in the context Oi 
while 02 is measured in the context 02 : Alice has therefore 
stochastically unrelated random variables and . 
Analogously, Bob has stochastically unrelated and . 
Their stochastic uiirelatedness is quite obvious: why should 
any response by a person to Oi (or bi) be paired with any 
response by another person to 02 (respectively, 62 )? 

By the same argument, either of Alice’s measurements is 
stochastically unrelated to either of Bob’s: the four mea¬ 
surements 


r:i,rz,rIi,rIi ( 10 ) 

are made in four different contexts. It is clear, however, 
that Alice and Bob could try to form joint distributions of 
their measurements using some empirical coupling proce¬ 
dure, e.g., the pairing of the measurements by the marital 
relation: that is, pairing a husband’s response to with 
his wife’s response to bj , for each of the four combinations 
of z = 1,2 and j = 1, 2. To do this means to form new con¬ 
texts, (ai, 6 i), ( 01 , 62 ), ( 02 , 61 ), and ( 02 , 62 ), and to re-label 
Alice’s random variables as 


T >{ ai , bi ) 0(01,62) 0(02,61) 0(02,62) 


while Bob’s random variables become 


^(01,61) jj (02,6 i ) ^^(01,62) ^^(02,62) 
’ %i ’ *"62 ’ ^ b 2 


( 11 ) 


( 12 ) 


The previous four pairwise stochastically unrelated vari¬ 
ables m are replaced now with the four pairwise stochas¬ 
tically unrelated variables 


^(o.,6i) ^ ^ g |i^2}. 


(13) 


There is no justification for saying that either of these rep¬ 
resentations, m or m, is more “correct” than another: 
Alice who does not know whose husband her respondent is 
and Alice who knows this deal with different sets of random 
variables. 

Another, related objection is that radical contextualism 
should lead to considering every realization of a random 
variable as being stochastically unrelated to every other 
realization. In the previous example, if Alice records the 


Khrennikov thinks that in this respect von Mises’s approach is radi¬ 
cally different from Kolmogorov’s, an opinion one can disagree with ^ This example is formally equivalent to the EPR/Bohm experiment 
if the KPT is not confined to a single probability space. in quantum physics (see Section IlLTl . 
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identities of the people she is posing the question oi to, 
then in place of a single she creates random variables 
etc., each with a single realization. In a 
typical behavioral experiment, John, Peter, etc. can be 
replaced with “trial 1,” “trial 2,” etc. As the contexts (upper 
indexes) differ, the variables are stochastically unrelated. 
Isn’t this a problem? In particular, does not stochastic 
unrelatedness of the realizations of a random variable clash 
with the standard statistical practice of viewing them as 
independent identically distributed variables? 

The answer to these questions is essentially the same as 
to the previous objection. Alice does not have to record 
the identity of the people she queries, and if she does not, 
then is the random variable she forms. If she does the 

C41 

recording, then she creates new contexts, and it is indeed 
true then that etc, are pairwise stochasti¬ 

cally unrelated. They can, however, be coupled, and as 
always when coupling is not based on an empirical proce¬ 
dure, this can be done in a multitude of ways. One possi¬ 
ble coupling is the independent coupling, the other is the 
identity coupling, and one could create an infinity of other 
couplings. 

This may be difficult to understand. Suppose we know 
that John responded Yes, Peter responded No, Paul re¬ 
sponded No, etc. — then how could one speak of the iden¬ 
tity coupling? Or, if we know that the response in trial 
n -|- 1 repeats the response in trial n with probability 0.7 
— how can one speak then of the independent coupling? 
To answer these questions one should recall that by cou¬ 
pling pairwise stochastically unrelated ... 

one does not mysteriously transform them into jointly dis¬ 
tributed random variable. Instead one creates a new se¬ 
quence R^‘^\ R^^\ .. ^, in which each has the 

same distribution as , and all these components have a 
joint distribution. The table below demonstrates the logical 
structure of the identity coupling: 

rW R(i) Ris) ... 

R^^'’ I n I r2 ra ... 

n I r 2 I ra ... ( 14 ) 

R^^'> n r2 I ra | ... 


The boxed values are the ones factually observed, the rest 
of the values in each column are those attained by the cor¬ 
responding components of the identity coupling. As we see, 
this has nothing to do with the observed values being or not 
being equal to each other. 

The next table demonstrates the logical structure of the 
independent coupling: 




i?(2) 









r'i 


r" 

r" 

'2 



(15) 


The boxed values, again, are those factually observed, and 
the primed values in the zth column are sampled from a cou¬ 
pling (^R^^\R^‘^\ R^^\ .. with stochastically independent 

components and = r^. Again, this has nothing to do 
with the observed values forming or not forming a sequence 
with certain statistical properties. 

Focusing on the statistical properties of the observed 
(“boxed”) values means, formally, that the observations in 
different trials (or responses from different persons) are 
treated as objects rather than contexts, all these objects 
being measured in a single context and therefore jointly 
distributed: 


n(1.2.3,...) n(l,2.3....) n(1.2,3....) 

, 1X2 ) ^1,3 ) • ■ • 


(16) 


A third objection one can raise against the radically con¬ 
textual reinterpretation (or revision) of the KPT is that the 
notions of an “object” and a “context” are not mathemati¬ 
cally defined: they are primitives of the language proposed. 
How can one know what objects and what contexts to in¬ 
voke in a specific situation? The response to this is that it 
is indeed not a mathematical issue. Mathematical analysis 
begins once one has specified a set Q of objects and a set 
C of contexts, and there is no single correct way of doing 
it. 

Consider, e.g., the situation when two questions are asked 
in one of two orders, a ^ 5 or 6 —>■ a. One can take a 
to be the same object measured in two different contexts, 
and similarly for 6 , forming thereby four random variables 
(responses to the questions) 

TDa—^b TDa—^b jDb—^a jDb—^a 


By our rules, they are grouped into two stochastically un¬ 
related random variables 


Ra^b ^ ^Ra^b^Ra^b) and . (18) 

This view of the situation leads to an interesting contextual 
analysis (Dzhafarov, Zhang, & Kujala, 2015). 

It is, however, possible to deny that “the same question 
a” means “the same object a” in the two contexts: one 
can maintain instead that a asked first is simply a different 
object from a asked second; and similarly for b. In this view 
we have four objects, 01 , 02 , 51,62 (where index indicates 
whether the question is asked first or second), measured in 
two contexts, oi —>■ 62 and 61 —>- 02 . One ends up with two 
stochastically unrelated random variables 

Rar^b^ ^ 

and (19) 

Rbi^a2 ^ ^ 

This representation allows for no nontrivial contextual 
analysis (see below), as the stochastically unrelated random 
variables have no objects in common. It is, nevertheless, 
as legitimate a representation as the previous one. A psy¬ 
chologist will most probably choose (|T 8 )) over (flQl) (Wang 
& Busemeyer, 2013; Wang et ah, 2014), but it is not math¬ 
ematics that dictates this choice. 


II. 7. An example of contextual analysis 


The problem of selective influences was introduced to 
psychology by Sternberg (1969) and developed through 
a series of publications (Schweickert & Townsend, 1989; 
Townsend, 1984, 1990; Townsend & Schweickert, 1989; 
Roberts & Sternberg, 1993; Townsend & Nozawa, 1995; 
Schweickert, Giorgini, & Dzhafarov, 2000; Dzhafarov 2003; 
Dzhafarov, Schweickert, & Sung, 2005; Kujala & Dzha¬ 
farov, 2008; Dzhafarov & Kujala, 2010). Later, a link has 
been established between this problem and the quantum- 
mechanical analysis of entanglement (Dzhafarov & Ku¬ 
jala, 2012a-b, 2013, 2014c) and, more generally, probabilis¬ 
tic contextuality (Dzhafarov & Kujala, 2014a-b, 2015a-b; 
Dzhafarov, Kujala, & Larsson, 2015; Kujala, Dzhafarov, & 
Larsson, 2015). 

I will formulate the problem using the contextual lan¬ 
guage introduced above. Let there be a system acted upon 
by two inputs, a and /3, and reacting by two simultaneous 
distinct responses, and Rp (or distinct aspects of the 
same response, such as response time and response accu¬ 
racy). The indexation here reflects the belief (or hypoth¬ 
esis) that Ra is “primarily” influenced by a and Rp by /3. 
One can also say that Rq. measures a and Rp measures /3. 
The question is whether R^ is also influenced by /3 and/or 
Rp is also influenced by a. Let us simplify the problem by 
assuming that a G {1,2} and (3 G {1,2}, and they vary in 
a completely crossed factorial design, {1,2} x {1,2}. Each 
of the treatments (a,/3) = (*,j) should be considered a 
context, wherefrom the responses of the system must be 
labeled 

( 20 ) 

To remind the interpretation, R^a=i‘’^~^^ measures the ob¬ 
ject a = * in the context (a = i, (3 = j); mea¬ 

sures the object /3 = j in the same context; being in the 
same context, these two measurements form a random vari¬ 
able (whose components possess a joint distri¬ 

bution); however, the four random variables R^ a=i,3=j) are 
pairwise stochastically unrelated. To lighten the notation, 
let us put 


„(a=i,P=j) _ .ij „{a=i,P=j) _ pij 
^a=i — ~ ■ 


( 21 ) 


According to the definition of selective influences given in 
Dzhafarov (2003) and elaborated in Dzhafarov and Kujala 
(2010), one says that Af is not influenced by /3 and is 
not influenced by a (for all i,j) if one can find a coupling 


I 4^1 4^2 ol2 421 o21 .22 o22 

in which the equalities 

Jll _ Jl2 J21 _ 722 

~ ^2 I 


( 22 ) 


(23) 


= Bf 


21 


D12 _ D22 

±>2 — ^2 


hold with probability 1. Put differently, the random vari¬ 
ables A\^ and A\^ in the joint distribution of always 


attain one and the same value, even though the value of (3 
changes; and analogously for the remaining three equalities. 
Note that, by the definition of a coupling, 

and.§;^^R;^ *,je{l,2}, (24) 


where = means “has the same distribution as.” 

If all A and B responses of the system have a finite num¬ 
ber of possible values, this situation generalizes Bohm’s 
version of the Einstein-Podolsky-Rosen (EPR) paradigm 
(Bohm & Aharonov, 1957; Bell, 1964). Of course, the dis¬ 
tributions of the A, B need not be generally in compliance 
with the quantum rules for entangled particles, but the ex¬ 
istence or nonexistence of a coupling with the stipulated 
properties should be determinable for any observed A and 
B. Let us assume for simplicity that both A and B re¬ 
sponses of the system are binary, and let us denote their 
values -1-1 and —1. In this special case the necessary and 
sufficient conditions for the selectiveness of influences are 
given by 


and 


j^ii A 2 

B]^ ^Bf for j = 1,2 


max 

/c,iG{ 1 . 2 } 





E E 


- 2E [Af ] 





< 2 , 


(26) 


where E stands for expected value. This (in an algebraically 
different form) was first proved by Eine (1982), although 
(1251) in his work is implied by the notation rather than 
stated explicitly. The distributional equalities (1^51) describe 
the condition known as marginal selectivity: the distribu¬ 
tion of A'f does not change with the value j of f3, and 
the distribution of B'^ does not change with the value i of 
a. The numerical inequality (|26p is known as the CHSH 
inequality (after the authors of Clauser et ah, 1969). In 
quantum mechanics, violations of this inequality when the 
marginal selectivity (1^51) holds is described by saying that 
the system is contextual (see, e.g., Kurzynski, Ramanathan, 
& Kaszlikowski, 2012). 

If the marginal selectivity (1^51) is violated, the CHSH in¬ 
equality (1261) cannot be derived, and it makes no difference 
whether it is satisfied or not. Moreover, if marginal selec¬ 
tivity is violated, it seems unnecessary to look at anything 
else: clearly then A is not selectively influenced by a alone, 
and/or B is not selectively influenced by (3 alone. As it 
turns out, however, one may still be interested in the ques¬ 
tion: is the influence of (3 upon A and/or of a upon B 
entirely described by the violations of marginal selectivity? 
Indeed, since the CHSH inequality (051) may very well be 
violated when the marginal selectivity (051) holds, and since 
we then conclude that selectiveness of influences is violated 
too, we have to admit that the “wrong” influences (from to 
A and/or from ato B) can be indirect, without manifesting 
themselves in changed marginal distributions. This leads 
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us to a generalized notion of contextuality (Dzhafarov, Ku- 
jala, & Larsson, 2015; Kujala, Dzhafarov, & Larsson, 2015; 
Dzhafarov & Kujala, 2015a-b; Dzhafarov, Zhang, Kujala, 
2015; Dzhafarov, Kujala, & Cervantes, 2016). 

When applied to our example with two binary inputs a, (3 
and two binary random outputs A, B, the definition is as 
follows. A system 




is noncontextual if it has a maximally connected coupling. 
The latter is defined as a coupling in which each of the 
equalities ((^51) holds with the maximal possible probability 
that is allowed by the individual distributions of the ran¬ 


dom variables. To explain, if = A]^, then the maximal 


possible value for Pr 


Aji = Aj 


is 1. Applying this to all 


ll — J-Li 

other equalities in (1^51) . we get the previous definition. If, 
however, Aj^ and Aj^ have different distributions, then the 

maximal possible value for Pr Aj^ = A^ 


IS 


min {Pr [A\^ = l] , Pr [A^ = l] } 

-I- min {Pr [A\^ = —l] , Pr [Aj^ = —l] } (27) 

= 1 - |Pr [Aji = 1] - Pr [Aj2 = l] | . 

If some coupling (1^^ has this and the analogously com¬ 
puted maximal values for other equalities in (|23l) . then the 
system is noncontextual: the “wrong” influences in it are 
all confined to directly changing the distributions of the 
“wrong” random variables. If no such coupling exists, how¬ 
ever, the system is contextual: the influence of /3 upon 
A and/or a upon B is greater than just distributional 
changes. As shown in Dzhafarov, Kujala, and Larsson 
(2015), Kujala, Dzhafarov, and Larsson (2015), and Ku¬ 
jala and Dzhafarov (in press), the necessary and sufficient 
condition for noncontextuality in accordance with this def¬ 
inition is 


maxfe_ig{i^2} 

<2 + EL 


ai^b;^ 


SijG{l,2} 

E[Af]-E[Af]|-fE-=i 


- 2E [AfBfl 


B 




- E 




2j 


For application of this and other criteria of contextuality to 
available experimental data in physics and psychology see, 
respectively, Kujala, Dzhafarov, and Larsson (2015) and 
Dzhafarov, Zhang, and Kujala (2015). 


III. CONCLUSION 

I have argued in this paper that the KPT (Kolmogoro- 
vian probability theory) must allow for stochastically un¬ 
related random variables, and these must not be confused 
with stochastically independent ones. I have argued for rad¬ 
ical contextualism: any two random variables recorded un¬ 
der different conditions (in different contexts) are stochasti¬ 
cally unrelated. There is no fixed set of pairwise stochasti¬ 


cally unrelated random variables: they can be freely intro¬ 
duced and freely coupled. To couple a given set of stochas¬ 
tically unrelated random variables means to create their 
jointly distributed “copies” (stochastically unrelated to the 
“originals”). The couplings for a given set of random vari¬ 
ables are typically infinite in number, with no coupling be¬ 
ing “more correct” than another. This applies also to cou¬ 
plings with stochastically independent components. The 
idea I and Jaime Kujala have been promoting in recent pub¬ 
lications is that stochastically unrelated random variables 
can be usefully characterized by their possible couplings, in 
particular, by determining whether these variables allow for 
couplings subject to certain constraints. I have illustrated 
this idea on the issue of selective influences, generalized into 
the issue of probabilistic contextuality. 
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